Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consolidation power of extrinsic rewards: reward cues enhance long-term memory for irrelevant past events.

Recent research suggests that extrinsic rewards promote memory consolidation through dopaminergic modulation processes. However, no conclusive behavioral evidence exists given that the influence of extrinsic reward on attention and motivation during encoding and consolidation processes are inherently confounded. The present study provides behavioral evidence that extrinsic rewards (i.e., moneta...

متن کامل

Achieving Master Level Play in 9 x 9 Computer Go

The UCT algorithm uses Monte-Carlo simulation to estimate the value of states in a search tree from the current state. However, the first time a state is encountered, UCT has no knowledge, and is unable to generalise from previous experience. We describe two extensions that address these weaknesses. Our first algorithm, heuristic UCT, incorporates prior knowledge in the form of a value function...

متن کامل

Achieving Master Level Play in 9 × 9 Computer Go

The UCT algorithm uses Monte-Carlo simulation to estimate the value of states in a search tree from the current state. However, the first time a state is encountered, UCT has no knowledge, and is unable to generalise from previous experience. We describe two extensions that address these weaknesses. Our first algorithm, heuristic UCT, incorporates prior knowledge in the form of a value function...

متن کامل

Reinforcement Learning Without Rewards

Machine learning can be broadly defined as the study and design of algorithms that improve with experience. Reinforcement learning is a variety of machine learning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to “interactive” prob...

متن کامل

Oscillatory rhythm of reward: anticipation and processing of rewards in children with and without autism

Background Autism spectrum disorder (ASD) is a complex neurodevelopmental condition, and multiple theories have emerged concerning core social deficits. While the social motivation hypothesis proposes that deficits in the social reward system cause individuals with ASD to engage less in social interaction, the overly intense world hypothesis (sensory over-responsivity) proposes that individuals...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the AAAI Conference on Artificial Intelligence

سال: 2020

ISSN: 2374-3468,2159-5399

DOI: 10.1609/aaai.v34i04.6040